Information extraction for the geospatial domain
نویسنده
چکیده
Geospatial knowledge is increasingly becoming an essential part of software applications. This is primarily due to the importance of mobile devices and of locationbased queries on the World Wide Web. Context models are one way to disseminate geospatial data in a digital and machine-readable representation. One key challenge involves acquiring and updating such data, since physical sensors cannot be used to collect such data on a large scale. Doing the required manual work is very time-consuming and expensive. Alternatively, a lot of geospatial data already exists in a textual representation, and this can instead be used. The question is how to extract such information from texts in order to integrate it into context models. In this thesis we tackle this issue and provide new approaches which were implemented as prototypes and evaluated. The first challenge in analyzing geospatial data in texts is identifying geospatial entities, which are also called toponyms. Such an approach can be divided into several steps. The first step marks possible candidates in the text, which is called spotting. Gazetteers are the key component for that task but they have to be augmented by linguistically motivated methods to enable the spotting of inflected names. A second step is needed, since the spotting process cannot resolve ambiguous entities. For instance, London can be a city or a surname; we call this a geo/non-geo ambiguity. There are also geo/geo ambiguities, e.g. Fulda (city) vs. Fulda (river). For our experiments, we prepared a new dataset that contains mentions of street names. Each mention was manually annotated and one part of the data was used to develop methods for toponym recognition and the remaining part was used to evaluate performance. The results showed that machine learning based classifiers perform well for resolving the geo/non-geo ambiguity. To tackle the geo/geo ambiguity we have to ground toponyms by finding the corresponding real world objects. In this work we present such approaches in a formal description and in a (partial) prototypical implementation, e.g., the recognition of vernacular named regions (like old town or financial district).
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملDistributed Geospatial Information Services-architectures, Standards, and Research Issues
It is estimated that more than 80% of data that human beings have collected so far are geospatial data. In order for the geospatial data to be useful, information has to be extracted from the data and converted to knowledge. However, currently it is very difficult for general users to obtain geospatial data and turn them into useful information and knowledge. In order for geospatial information...
متن کاملInteroperability for Geospatial Analysis: a Semantics and Ontology-based Approach
Information extraction and integration from heterogeneous, autonomous data resources are major requirements for many spatial applications. Geospatial analysis for scientific discovery involves identification of relevant information resources, extraction and fusion of requisite subsets of the information, application of spatial analytical techniques and visualization of the results in an appropr...
متن کاملA Framework for the Representation of Geospatial Image Processing Operations
Research advances in geospatial automated image analysis tools and feature extraction algorithms have matured in recent times to levels of practical applicability. The consolidation of such tools and algorithms would result in enhanced image analysis capabilities. This has motivated research in developing formalisms for representation of process information that can assist in integrating tools ...
متن کاملSalient regions detection in satellite images using the combination of MSER local features detector and saliency models
Nowadays, due to quality development of satellite images, automatic target detection on these images has been attracted many researchers' attention. Remote-sensing images follow various geospatial targets; these targets are generally man-made and have a distinctive structure from their surrounding areas. Different methods have been developed for automatic target detection. In most of these met...
متن کاملExpert Systems Applied to Problems in Geographic Information Systems : Introduction , Review and Prospects
This paper discusses the nature of expert systems with special attention on construction of expert systems. We identify four major problem domains of geographic information systems in which expert system technology has been applied map design, terrain/feature extraction, geographic database management, and geographic decision support systems. Efforts in each problem domain are critically review...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014